{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Day17 终章：正则表达式在pandas中的应用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "　　通过前面16个日程的学习，如果你有认真学习及思考，那么相信你已经学会如何利用**正则表达式**解决各种复杂的文本匹配问题，而今天的日程作为我们**聚沙成堆-正则表达式**打卡学习活动的最后一期，我们就来学习一下如何在`pandas`中配合正则完成诸多任务。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- **str.count**\n",
    "\n",
    "　　在针对字符型`Series`以及`DataFrame`中的字符型列时，我们可以使用`str.count`来捕获每个元素中待搜索模式出现的次数，就像下面的例子一样："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-09-16T14:24:21.584751Z",
     "iopub.status.busy": "2020-09-16T14:24:21.584751Z",
     "iopub.status.idle": "2020-09-16T14:24:21.966717Z",
     "shell.execute_reply": "2020-09-16T14:24:21.966717Z",
     "shell.execute_reply.started": "2020-09-16T14:24:21.584751Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    35\n",
       "1    15\n",
       "2    32\n",
       "dtype: int64"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "s = pd.Series(['国务院总理李克强9月15日晚在人民大会堂出席世界经济论坛全球企业家特别对话会，',\n",
    "               '发表致辞并同企业家代表互动交流。',\n",
    "               '对话会以视频方式举行，世界经济论坛主席施瓦布主持，全球近600位企业家参加。'])\n",
    "\n",
    "# 计算汉字的数量\n",
    "s.str.count('[\\u4e00-\\u9fa5]')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- **str.contains**\n",
    "\n",
    "　　利用`str.contains`，我们可以分别检查`Series`中是否存在某种模式："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-09-16T14:24:21.967715Z",
     "iopub.status.busy": "2020-09-16T14:24:21.967715Z",
     "iopub.status.idle": "2020-09-16T14:24:21.975693Z",
     "shell.execute_reply": "2020-09-16T14:24:21.974695Z",
     "shell.execute_reply.started": "2020-09-16T14:24:21.967715Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(['1232.32', '32123.321', '321323.213', '321321|'])\n",
    "\n",
    "# 检查是否存在非数字或小数点的字符元素\n",
    "s.str.contains('[^\\d\\.]')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- **str.replace**\n",
    "\n",
    "　　利用`str.replace`我们可以灵活地对原数据中的异常字符进行发现并替换操作，这在数据清洗时非常常用："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-09-16T14:24:21.976691Z",
     "iopub.status.busy": "2020-09-16T14:24:21.976691Z",
     "iopub.status.idle": "2020-09-16T14:24:21.985667Z",
     "shell.execute_reply": "2020-09-16T14:24:21.984669Z",
     "shell.execute_reply.started": "2020-09-16T14:24:21.976691Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "675999.854"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对原始数据中的非数值或小数点部分进行空字符替换，并转化为数值从而进行计算\n",
    "pd.to_numeric(s.str.replace('[^\\d\\.]', '')).sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- **str.findall**\n",
    "\n",
    "　　`str.findall`类似`re`中`findall`的功能，帮助我们从原始元素中提取所有满足匹配模式的元素："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-09-16T14:24:21.986665Z",
     "iopub.status.busy": "2020-09-16T14:24:21.986665Z",
     "iopub.status.idle": "2020-09-16T14:24:21.997634Z",
     "shell.execute_reply": "2020-09-16T14:24:21.996637Z",
     "shell.execute_reply.started": "2020-09-16T14:24:21.986665Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    [国务院总理李克强, 月, 日晚在人民大会堂出席世界经济论坛全球企业家特别对话会]\n",
       "1                            [发表致辞并同企业家代表互动交流]\n",
       "2     [对话会以视频方式举行, 世界经济论坛主席施瓦布主持, 全球近, 位企业家参加]\n",
       "dtype: object"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(['国务院总理李克强9月15日晚在人民大会堂出席世界经济论坛全球企业家特别对话会，',\n",
    "               '发表致辞并同企业家代表互动交流。',\n",
    "               '对话会以视频方式举行，世界经济论坛主席施瓦布主持，全球近600位企业家参加。'])\n",
    "\n",
    "s.str.findall('[\\u4e00-\\u9fa5]+')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- **str.split**\n",
    "\n",
    "　　利用`str.split`我们可以对每个元素快捷的进行模式分割："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-09-16T14:24:21.999628Z",
     "iopub.status.busy": "2020-09-16T14:24:21.998632Z",
     "iopub.status.idle": "2020-09-16T14:24:22.009602Z",
     "shell.execute_reply": "2020-09-16T14:24:22.008605Z",
     "shell.execute_reply.started": "2020-09-16T14:24:21.999628Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    [U.S, stocks opened up modestly in the green a...\n",
       "1    [The Dow Jones industrial average was up about...\n",
       "2    [Most of the Fed representatives included in W...\n",
       "dtype: object"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(['U.S. stocks opened up modestly in the green at the beginning of Wednesday’s trading, as investors wait to hear from Federal Reserve leaders during the afternoon’s committee meeting.',\n",
    "              'The Dow Jones industrial average was up about 55 points, or 0.2 percent, at market open. The S&P 500 was up 12 points, or nearly 0.4 percent, while the tech-heavy Nasdaq composite index was up almost 33 points, or 0.3 percent.',\n",
    "              'Most of the Fed representatives included in Wednesday’s Federal Open Market Committee meeting have spoken publicly about the need for federal aid for Americans struggling during the coronavirus pandemic and recession. The committee is scheduled to offer a monetary policy statement, and Fed Chair Jerome H. Powell will answer questions.'])\n",
    "\n",
    "# 英文分句\n",
    "s.str.split('[,\\.] ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "　　设置`expand=True`，每个单独分割出的元素会形成单独的列（总列数以最大分割出的分段字符数量为准）:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-09-16T14:24:22.010600Z",
     "iopub.status.busy": "2020-09-16T14:24:22.010600Z",
     "iopub.status.idle": "2020-09-16T14:24:22.032539Z",
     "shell.execute_reply": "2020-09-16T14:24:22.032539Z",
     "shell.execute_reply.started": "2020-09-16T14:24:22.010600Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>U.S</td>\n",
       "      <td>stocks opened up modestly in the green at the ...</td>\n",
       "      <td>as investors wait to hear from Federal Reserve...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>The Dow Jones industrial average was up about ...</td>\n",
       "      <td>or 0.2 percent</td>\n",
       "      <td>at market open</td>\n",
       "      <td>The S&amp;P 500 was up 12 points</td>\n",
       "      <td>or nearly 0.4 percent</td>\n",
       "      <td>while the tech-heavy Nasdaq composite index wa...</td>\n",
       "      <td>or 0.3 percent.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Most of the Fed representatives included in We...</td>\n",
       "      <td>The committee is scheduled to offer a monetary...</td>\n",
       "      <td>and Fed Chair Jerome H</td>\n",
       "      <td>Powell will answer questions.</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   0  \\\n",
       "0                                                U.S   \n",
       "1  The Dow Jones industrial average was up about ...   \n",
       "2  Most of the Fed representatives included in We...   \n",
       "\n",
       "                                                   1  \\\n",
       "0  stocks opened up modestly in the green at the ...   \n",
       "1                                     or 0.2 percent   \n",
       "2  The committee is scheduled to offer a monetary...   \n",
       "\n",
       "                                                   2  \\\n",
       "0  as investors wait to hear from Federal Reserve...   \n",
       "1                                     at market open   \n",
       "2                             and Fed Chair Jerome H   \n",
       "\n",
       "                               3                      4  \\\n",
       "0                           None                   None   \n",
       "1   The S&P 500 was up 12 points  or nearly 0.4 percent   \n",
       "2  Powell will answer questions.                   None   \n",
       "\n",
       "                                                   5                6  \n",
       "0                                               None             None  \n",
       "1  while the tech-heavy Nasdaq composite index wa...  or 0.3 percent.  \n",
       "2                                               None             None  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.str.split('[,\\.] ', expand=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Day17 课后小测验\n",
    "\n",
    "　　作为正则系列打卡的最后一次课后习题，我们将使用到外部文件`ChnSentiCorp_htl_all.csv`，它记录了一些关于酒店的评论数据，请你根据`match_list`中我们所关注的几种对象，在原始数据框`review`列中计算每条评论包含上述对象的总出现次数，并作为新的1列`count`添加到数据框中，最后按照`count`降序排列数据框："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-09-16T14:24:22.038523Z",
     "iopub.status.busy": "2020-09-16T14:24:22.037526Z",
     "iopub.status.idle": "2020-09-16T14:24:22.098362Z",
     "shell.execute_reply": "2020-09-16T14:24:22.097366Z",
     "shell.execute_reply.started": "2020-09-16T14:24:22.037526Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('ChnSentiCorp_htl_all.csv')\n",
    "\n",
    "match_list = ['床', '卫生间', '餐', '服务', '空调']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-08-26T12:55:19.898241Z",
     "iopub.status.busy": "2020-08-26T12:55:19.897245Z",
     "iopub.status.idle": "2020-08-26T12:55:19.904225Z",
     "shell.execute_reply": "2020-08-26T12:55:19.903227Z",
     "shell.execute_reply.started": "2020-08-26T12:55:19.898241Z"
    }
   },
   "source": [
    "　　请将你的答案截图发到本帖评论区~"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['count']=sum([df['review'].str.count(_) for _ in match_list ])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.sort_values(\"count\",inplace=True,ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>review</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7259</th>\n",
       "      <td>0</td>\n",
       "      <td>我端午节期间带家人入住了一晚，感觉服务比较差，根本达不到四星的标准，下面我从硬件和服务两方面...</td>\n",
       "      <td>26.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5827</th>\n",
       "      <td>0</td>\n",
       "      <td>之前06年的时候入住感觉非常好，服务、硬件都不错。所以这次带父母、亲戚也入住这里，强力推荐了...</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5786</th>\n",
       "      <td>0</td>\n",
       "      <td>我是5/2日入住的该酒店的大床间，优点：酒店周围比较安静，周边环境还可以。缺点：1）酒店比较...</td>\n",
       "      <td>22.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5663</th>\n",
       "      <td>0</td>\n",
       "      <td>酒店地理位置比较比较偏僻，不过穿过两条马路就是石老人沙滩。酒店前台对客人比较怠慢，特别是女服...</td>\n",
       "      <td>22.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7189</th>\n",
       "      <td>0</td>\n",
       "      <td>先说说硬件设施。我住的是480元的经济套房，预订时我注明要无烟房。入住1109房间时，让我重...</td>\n",
       "      <td>22.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3069</th>\n",
       "      <td>1</td>\n",
       "      <td>我们这次圣诞新年假期原计划是在博鳌看宾斯基住一周的，结果一住就十天了。凯宾斯基饭店的环境是放...</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3070</th>\n",
       "      <td>1</td>\n",
       "      <td>我们这次圣诞新年假期原计划是在博鳌看宾斯基住一周的，结果一住就十天了。凯宾斯基饭店的环境是放...</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6836</th>\n",
       "      <td>0</td>\n",
       "      <td>强烈不推荐本酒店。通过携程预定了一间大床房，房价580元，单早，增加一份早餐68元，当时我们...</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1972</th>\n",
       "      <td>1</td>\n",
       "      <td>每次去杭州入住，有个很重要的原因就是因为四楼餐厅的菜比较好吃，很喜欢。不过，这次有些不快，入...</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6099</th>\n",
       "      <td>0</td>\n",
       "      <td>不知道前面这位弟兄是不是和我一样住的同一天。我是7.24入住的，25日一早怀着极其极其悲愤的...</td>\n",
       "      <td>16.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4227</th>\n",
       "      <td>1</td>\n",
       "      <td>总的先说一句话，徽商国际大酒店实在是太差！！！服务差！！！饭菜质量差！！！这个月初，我带了一...</td>\n",
       "      <td>16.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>632</th>\n",
       "      <td>1</td>\n",
       "      <td>和同事出差北京，事先通过查询后，选择了中船宾馆，于是在携程网上订了中船宾馆5月17日-20日...</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5496</th>\n",
       "      <td>0</td>\n",
       "      <td>一个字：差、差、差，态度差到极点。我们从7月22日住到7月27日，开了两间房，如果不是因为工...</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5400</th>\n",
       "      <td>0</td>\n",
       "      <td>入住经历：由于临时到邯郸，时间较晚，酒店都订满了，只好住这，以前通过携程订住过邯郸宾馆，设施...</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4382</th>\n",
       "      <td>1</td>\n",
       "      <td>因为“五一”期间宜昌的房比较紧，只订到了贵宾房。酒店位置还算可以。离火车站和夷陵广场不算远，...</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1828</th>\n",
       "      <td>1</td>\n",
       "      <td>每年去杭州数次,本次入住友好是看了网友的评价而去体验一下.友好果不负其名,服务质量上乘,从门...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1685</th>\n",
       "      <td>1</td>\n",
       "      <td>如果用一句话来点评：服务远远好过硬件。这次入住了行政湖景房和豪华商务房，感受比较多，就多说几...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2124</th>\n",
       "      <td>1</td>\n",
       "      <td>国庆期间携程特价入住该酒店豪华房，这是第一次入住沈阳的该酒店，以前一直入住青年公园附近的凯宾...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7548</th>\n",
       "      <td>0</td>\n",
       "      <td>空气好是没有话说的。鱼疗减少了很多的死皮。温泉的确是室外的好（虽然不是很深，人是不能站起来的...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4700</th>\n",
       "      <td>1</td>\n",
       "      <td>1、我原本是通过丽江某旅行社订房在香格里拉大道某酒店，到达酒店后，对酒店不甚满意。出租车司机...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7408</th>\n",
       "      <td>0</td>\n",
       "      <td>6月23日中午12点进入酒店，说明是通过携程预定了酒店客房。随后就碰到了很不爽的经历：一个面...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5724</th>\n",
       "      <td>0</td>\n",
       "      <td>2月11日入住下午2点前到达的，我与友人共四人，预定了两间大床房，到达时，前台共有4位服务员...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7046</th>\n",
       "      <td>0</td>\n",
       "      <td>1,房间空调跟电扇一样，不制冷，反映三次服务员答应给看看最后也没来2，因为有时差，凌晨0点入...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5073</th>\n",
       "      <td>1</td>\n",
       "      <td>从携程网上看到大家对海景花园的服务评价很高.我们于7月下旬到青岛避暑，特意入住该店。入住5天...</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1058</th>\n",
       "      <td>1</td>\n",
       "      <td>在义乌临时改变行程，因此凌晨给携程打的电话，印象中大家对此酒店评价不错，于是预定了。第二天在...</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2240</th>\n",
       "      <td>1</td>\n",
       "      <td>装修风格简洁时尚,感觉相当不错的酒店.只是对配套餐厅比较失望,先是衣宵供应时间较不方便,上次...</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5069</th>\n",
       "      <td>1</td>\n",
       "      <td>回来2天了，很忙，没来得及评价，今天周末得闲谈谈入住海景酒店的感触。我是7.25---7.3...</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>731</th>\n",
       "      <td>1</td>\n",
       "      <td>我是7-10号入住的豪华海景大床房，通过携程预定之前，曾给蓝天宾馆打过电话，当时是告知我没房...</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>92</th>\n",
       "      <td>1</td>\n",
       "      <td>我们从6月21日到6月25日在亚太(二期)一共住了四个晚上,原本想在这个酒店住两天,再到亚龙...</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7717</th>\n",
       "      <td>0</td>\n",
       "      <td>通过携程定了950元三间标准间的套餐，同朋友们三家九口一起入住。由于孩子都比较小，我又直接去...</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5059</th>\n",
       "      <td>1</td>\n",
       "      <td>总体感觉还不错,酒店位置很好,很繁华,逛商场非常方便.</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5057</th>\n",
       "      <td>1</td>\n",
       "      <td>安静乾净价格公道窗外的景致不错如果你要安静的住宿环境可以选择它</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5056</th>\n",
       "      <td>1</td>\n",
       "      <td>如家的标准装潢，没的说，不过还和楼下的一样的老问题，水温是确实的忽冷忽热，隔音也是确实的惨了...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5055</th>\n",
       "      <td>1</td>\n",
       "      <td>酒店环境不错,但离森林国家公园需步行20分钟,稍远.补充</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5040</th>\n",
       "      <td>1</td>\n",
       "      <td>&amp;#35828;&amp;#23454;&amp;#35805;，&amp;#23545;景&amp;#21306;酒店的硬...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5037</th>\n",
       "      <td>1</td>\n",
       "      <td>不能住靠马路的房间,太吵,一晚上没睡好.旁边有个大超市,吃饭方便</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5036</th>\n",
       "      <td>1</td>\n",
       "      <td>酒店的环境非常好，就在山脚下，离景点很近，也安静。但房间设施比较旧，地毯都变色了。除了这一点...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5035</th>\n",
       "      <td>1</td>\n",
       "      <td>环境确实不错，不过普通标间实在是不能住，被子潮得像能拧出水，豪华标间就要好很多了。前台很朴素...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5033</th>\n",
       "      <td>1</td>\n",
       "      <td>1.到饭店时门口数个工作人员，没有一个会打招呼，这种状况持续了两天，直到一天早晨与老外同出电...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5032</th>\n",
       "      <td>1</td>\n",
       "      <td>非常好的饭店,软件硬件都很过得去,值得一住的地方!</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5091</th>\n",
       "      <td>1</td>\n",
       "      <td>朝阳的房间不算大，朝南的房间比较大，有大沙发，但是比较昏暗。大堂比较暗，不敞亮。位置比较靠中...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5092</th>\n",
       "      <td>1</td>\n",
       "      <td>总体来说还可过得去。我们入住时，房间还未打扫，和酒店人员反应，马上处理了。房内设施应该只够三...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5108</th>\n",
       "      <td>1</td>\n",
       "      <td>房间内设施还不错,不过浴室的淋浴喷头不好用,水量不可调节,出水较散,有些不象四星级的标准.周...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6003</th>\n",
       "      <td>0</td>\n",
       "      <td>太烂了，房间小，门口停车还要收费，屋里霉味很重，不知道那些说好的人是怎么体验的。</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6028</th>\n",
       "      <td>0</td>\n",
       "      <td>一个字--差.4星收费,0星service....大堂前有小孩当埸小便--没人管.向前台要纸...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6020</th>\n",
       "      <td>0</td>\n",
       "      <td>非常不好，我们渡过了一个让人难以忍受的纪念日.com/thread-1368750-1-2....</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6017</th>\n",
       "      <td>0</td>\n",
       "      <td>因为客户就在旁边一栋楼，所以就近住了，不过这个地方好象也就只要这么一个大点的宾馆吧，同事说好...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6016</th>\n",
       "      <td>0</td>\n",
       "      <td>这个价格来说酒店糟透了.大堂很差很脏,我订的是商务套房,客厅里居然窗户都没有,灯光昏暗.所有...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6015</th>\n",
       "      <td>0</td>\n",
       "      <td>位置尚可，但距离海边的位置比预期的要差的多，只能远远看大海，没有停车场</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6004</th>\n",
       "      <td>0</td>\n",
       "      <td>号称四星的酒店，可是房间之小，内部设施最多也就相当于Motel168，锦江之星等简易旅馆，真...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5547</th>\n",
       "      <td>0</td>\n",
       "      <td>五一节期间入住，建筑挺宏伟，车开到酒店门口没有人理你或引导你停车到位，询问哪里有停车，被答复...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5109</th>\n",
       "      <td>1</td>\n",
       "      <td>房间的隔音设计实在是不行，墙壁与玻璃外墙之间明显是没堵好，其他还好。</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5546</th>\n",
       "      <td>0</td>\n",
       "      <td>两星级的设施(可能还没有)，四星的收费标准。真不知道四星级是怎么评下来的。和携程宣传有很大的...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5543</th>\n",
       "      <td>0</td>\n",
       "      <td>房间设备比较陈旧，没五星标准客人非常不满意</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5026</th>\n",
       "      <td>1</td>\n",
       "      <td>地理位置距离通常的客运码头比较远，要横穿小岛，预计30-40分钟，酒店处于岛的另一面很安静，...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5115</th>\n",
       "      <td>1</td>\n",
       "      <td>不错，外边看比较旧，但是里面很干净，交通也很方便，退房很快。在房间就可以看音乐喷泉，不错。</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5112</th>\n",
       "      <td>1</td>\n",
       "      <td>地理位置不错，，但是有点旧，，不是节假日基本上不用预定，价格到前台订都一样。</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5111</th>\n",
       "      <td>1</td>\n",
       "      <td>非常好的位置，便于交通，面对星湖。地毯很厚实，感觉不错。</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>距离川沙公路较近,但是公交指示不对,如果是\"蔡陆线\"的话,会非常麻烦.建议用别的路线.房间较...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6374</th>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>7766 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      label                                             review  count\n",
       "7259      0  我端午节期间带家人入住了一晚，感觉服务比较差，根本达不到四星的标准，下面我从硬件和服务两方面...   26.0\n",
       "5827      0  之前06年的时候入住感觉非常好，服务、硬件都不错。所以这次带父母、亲戚也入住这里，强力推荐了...   23.0\n",
       "5786      0  我是5/2日入住的该酒店的大床间，优点：酒店周围比较安静，周边环境还可以。缺点：1）酒店比较...   22.0\n",
       "5663      0  酒店地理位置比较比较偏僻，不过穿过两条马路就是石老人沙滩。酒店前台对客人比较怠慢，特别是女服...   22.0\n",
       "7189      0  先说说硬件设施。我住的是480元的经济套房，预订时我注明要无烟房。入住1109房间时，让我重...   22.0\n",
       "3069      1  我们这次圣诞新年假期原计划是在博鳌看宾斯基住一周的，结果一住就十天了。凯宾斯基饭店的环境是放...   21.0\n",
       "3070      1  我们这次圣诞新年假期原计划是在博鳌看宾斯基住一周的，结果一住就十天了。凯宾斯基饭店的环境是放...   21.0\n",
       "6836      0  强烈不推荐本酒店。通过携程预定了一间大床房，房价580元，单早，增加一份早餐68元，当时我们...   18.0\n",
       "1972      1  每次去杭州入住，有个很重要的原因就是因为四楼餐厅的菜比较好吃，很喜欢。不过，这次有些不快，入...   18.0\n",
       "6099      0  不知道前面这位弟兄是不是和我一样住的同一天。我是7.24入住的，25日一早怀着极其极其悲愤的...   16.0\n",
       "4227      1  总的先说一句话，徽商国际大酒店实在是太差！！！服务差！！！饭菜质量差！！！这个月初，我带了一...   16.0\n",
       "632       1  和同事出差北京，事先通过查询后，选择了中船宾馆，于是在携程网上订了中船宾馆5月17日-20日...   15.0\n",
       "5496      0  一个字：差、差、差，态度差到极点。我们从7月22日住到7月27日，开了两间房，如果不是因为工...   15.0\n",
       "5400      0  入住经历：由于临时到邯郸，时间较晚，酒店都订满了，只好住这，以前通过携程订住过邯郸宾馆，设施...   15.0\n",
       "4382      1  因为“五一”期间宜昌的房比较紧，只订到了贵宾房。酒店位置还算可以。离火车站和夷陵广场不算远，...   15.0\n",
       "1828      1  每年去杭州数次,本次入住友好是看了网友的评价而去体验一下.友好果不负其名,服务质量上乘,从门...   14.0\n",
       "1685      1  如果用一句话来点评：服务远远好过硬件。这次入住了行政湖景房和豪华商务房，感受比较多，就多说几...   14.0\n",
       "2124      1  国庆期间携程特价入住该酒店豪华房，这是第一次入住沈阳的该酒店，以前一直入住青年公园附近的凯宾...   14.0\n",
       "7548      0  空气好是没有话说的。鱼疗减少了很多的死皮。温泉的确是室外的好（虽然不是很深，人是不能站起来的...   14.0\n",
       "4700      1  1、我原本是通过丽江某旅行社订房在香格里拉大道某酒店，到达酒店后，对酒店不甚满意。出租车司机...   14.0\n",
       "7408      0  6月23日中午12点进入酒店，说明是通过携程预定了酒店客房。随后就碰到了很不爽的经历：一个面...   14.0\n",
       "5724      0  2月11日入住下午2点前到达的，我与友人共四人，预定了两间大床房，到达时，前台共有4位服务员...   14.0\n",
       "7046      0  1,房间空调跟电扇一样，不制冷，反映三次服务员答应给看看最后也没来2，因为有时差，凌晨0点入...   14.0\n",
       "5073      1  从携程网上看到大家对海景花园的服务评价很高.我们于7月下旬到青岛避暑，特意入住该店。入住5天...   14.0\n",
       "1058      1  在义乌临时改变行程，因此凌晨给携程打的电话，印象中大家对此酒店评价不错，于是预定了。第二天在...   13.0\n",
       "2240      1  装修风格简洁时尚,感觉相当不错的酒店.只是对配套餐厅比较失望,先是衣宵供应时间较不方便,上次...   13.0\n",
       "5069      1  回来2天了，很忙，没来得及评价，今天周末得闲谈谈入住海景酒店的感触。我是7.25---7.3...   13.0\n",
       "731       1  我是7-10号入住的豪华海景大床房，通过携程预定之前，曾给蓝天宾馆打过电话，当时是告知我没房...   13.0\n",
       "92        1  我们从6月21日到6月25日在亚太(二期)一共住了四个晚上,原本想在这个酒店住两天,再到亚龙...   13.0\n",
       "7717      0  通过携程定了950元三间标准间的套餐，同朋友们三家九口一起入住。由于孩子都比较小，我又直接去...   12.0\n",
       "...     ...                                                ...    ...\n",
       "5059      1                        总体感觉还不错,酒店位置很好,很繁华,逛商场非常方便.    0.0\n",
       "5057      1                    安静乾净价格公道窗外的景致不错如果你要安静的住宿环境可以选择它    0.0\n",
       "5056      1  如家的标准装潢，没的说，不过还和楼下的一样的老问题，水温是确实的忽冷忽热，隔音也是确实的惨了...    0.0\n",
       "5055      1                       酒店环境不错,但离森林国家公园需步行20分钟,稍远.补充    0.0\n",
       "5040      1  &#35828;&#23454;&#35805;，&#23545;景&#21306;酒店的硬...    0.0\n",
       "5037      1                   不能住靠马路的房间,太吵,一晚上没睡好.旁边有个大超市,吃饭方便    0.0\n",
       "5036      1  酒店的环境非常好，就在山脚下，离景点很近，也安静。但房间设施比较旧，地毯都变色了。除了这一点...    0.0\n",
       "5035      1  环境确实不错，不过普通标间实在是不能住，被子潮得像能拧出水，豪华标间就要好很多了。前台很朴素...    0.0\n",
       "5033      1  1.到饭店时门口数个工作人员，没有一个会打招呼，这种状况持续了两天，直到一天早晨与老外同出电...    0.0\n",
       "5032      1                          非常好的饭店,软件硬件都很过得去,值得一住的地方!    0.0\n",
       "5091      1  朝阳的房间不算大，朝南的房间比较大，有大沙发，但是比较昏暗。大堂比较暗，不敞亮。位置比较靠中...    0.0\n",
       "5092      1  总体来说还可过得去。我们入住时，房间还未打扫，和酒店人员反应，马上处理了。房内设施应该只够三...    0.0\n",
       "5108      1  房间内设施还不错,不过浴室的淋浴喷头不好用,水量不可调节,出水较散,有些不象四星级的标准.周...    0.0\n",
       "6003      0           太烂了，房间小，门口停车还要收费，屋里霉味很重，不知道那些说好的人是怎么体验的。    0.0\n",
       "6028      0  一个字--差.4星收费,0星service....大堂前有小孩当埸小便--没人管.向前台要纸...    0.0\n",
       "6020      0  非常不好，我们渡过了一个让人难以忍受的纪念日.com/thread-1368750-1-2....    0.0\n",
       "6017      0  因为客户就在旁边一栋楼，所以就近住了，不过这个地方好象也就只要这么一个大点的宾馆吧，同事说好...    0.0\n",
       "6016      0  这个价格来说酒店糟透了.大堂很差很脏,我订的是商务套房,客厅里居然窗户都没有,灯光昏暗.所有...    0.0\n",
       "6015      0                位置尚可，但距离海边的位置比预期的要差的多，只能远远看大海，没有停车场    0.0\n",
       "6004      0  号称四星的酒店，可是房间之小，内部设施最多也就相当于Motel168，锦江之星等简易旅馆，真...    0.0\n",
       "5547      0  五一节期间入住，建筑挺宏伟，车开到酒店门口没有人理你或引导你停车到位，询问哪里有停车，被答复...    0.0\n",
       "5109      1                 房间的隔音设计实在是不行，墙壁与玻璃外墙之间明显是没堵好，其他还好。    0.0\n",
       "5546      0  两星级的设施(可能还没有)，四星的收费标准。真不知道四星级是怎么评下来的。和携程宣传有很大的...    0.0\n",
       "5543      0                              房间设备比较陈旧，没五星标准客人非常不满意    0.0\n",
       "5026      1  地理位置距离通常的客运码头比较远，要横穿小岛，预计30-40分钟，酒店处于岛的另一面很安静，...    0.0\n",
       "5115      1      不错，外边看比较旧，但是里面很干净，交通也很方便，退房很快。在房间就可以看音乐喷泉，不错。    0.0\n",
       "5112      1             地理位置不错，，但是有点旧，，不是节假日基本上不用预定，价格到前台订都一样。    0.0\n",
       "5111      1                       非常好的位置，便于交通，面对星湖。地毯很厚实，感觉不错。    0.0\n",
       "0         1  距离川沙公路较近,但是公交指示不对,如果是\"蔡陆线\"的话,会非常麻烦.建议用别的路线.房间较...    0.0\n",
       "6374      0                                                NaN    NaN\n",
       "\n",
       "[7766 rows x 3 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
